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Hello, 


Thank you for requesting a more detailed protocol for individual genes. 

| first dowloaded from BioMart on Ensembl (https://www.ensembl.org/biomart/martview/509ba48 1 5fidb 1363eae85c37f90d5c91) the following informations 
for all the protein coding genes (from Ensembl release 83) of the Human reference genome. 

Ensembl.Gene.ID : Ensembl gene identifiers 


gene.symbol : gene name 
Transcript : Ensembl transcript ID 
Chrom : Chromosome 

Start_hg18 : start position on hg18 
End_hg18 : end position 
LgGenes : gene length 

LgCDS : CDS length 

GC :GC content of CDS 

GC3 : GC content at third position 
GCflank : GC content at flanking regions (10kb upstream and 10kb downstream of the transcription unit) 
GCi : intronic GC content 


In parallel, | downloaded the sex-averaged recombination map from HapMap release 22 from ftp://ftp.ncbi.nlm.nih.gowhapmap/recombination/latest/rates/. | 
recommend you to use the most recent version of recombination map if you have just sarted your project on humans. You could also have a look at the 


Decode map. 
Recombination rate R in cM/Mb is computed as: R = (Gj - Gi) / (Pj - Pi)*1e6 where Gj is the genic position (in cM) of the nucleotide j and Pj its physical 
position (in bp). We estimated the average intragenic recombination rate between the beginning (i) and the end(j) of genes that are > 5kb. 


Concerning the expression datasets, we used the following datasets: 

1. From Guo et al, 2015, we downloaded the Panel 4 ("FPKM of pool-split PGCs") of the table S1 "Summary of Single-Cell RNA-Seq Dataset and 
Expression Levels of RefSeq Genes in Human PGCs and Neighboring Somatic Cells" 

2. From Kryuchkova-Mostacci N, Robinson-Rechavi M. (2015), we used the processed table "File_31_Hum_Data_Tissues_Fagerberg.txt" available as a 


Supplementary material. 

3. From Lesch et al, 2016, we downloaded the expression levels in PS and RS of 3 males from https://www.ncbi.nim.nih.gow/geo/query/acc.cgi? 
acc=GSE68507. These files are: GSM1673959 (human1_PS_RNA), GSM1673963 (human1_RS_RNA), GSM1673967 (human2_PS_RNA), 
GSM1673971 (human2_RS_RNA), GSM1673975 (human3_PS_RNA) and GSM1673978 (human3_RS_RNA). 


We combine these infos using awk and bash scripting. 


Once the table is made, we used an R script to make the figures. The R script "figures_HumanCodonUsage_functions.R" as well as the README are 
available on zenodo: hittps://zenodo.org/record/835063#.XwRa499fg5k in the zipped folder: fig_ HumanCodonUsage.zip 


In terms of GO analyses: the choice of GO categories, proliferation and differentiation categories was done according to the paper of Gingold et al (2014) 
meaning that | followed their protocol and the legend of the PCA figure to decide whether a GO category is associated to proliferation for instance. Once | 
have the GO_* files prepared, | concatenated, sorted and extracted unique genes names associated with proliferation (resp. differenciation). |used a 
combination of cat, sort and uniq in the terminal. Then | compared the 2 lists: if a gene name was present twice | put it in the "both" category while if it was 
present once it was restricted to either prolif or diff. 


Please let me know if | have answered your questions. 
Best, 
Fanny Pouyet 
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